{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ijGzTHJJUCPY"
      },
      "outputs": [],
      "source": [
        "# Copyright 2025 Google LLC\n",
        "#\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "#     https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NDsTUvKjwHBW"
      },
      "source": [
        "# Import from BigQuery into Vector Search\n",
        "\n",
        "<table align=\"left\">\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/embeddings/bigquery-import.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg\" alt=\"Google Colaboratory logo\"><br> Run in Colab\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fembeddings%2Fbigquery-import.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Run in Colab Enterprise\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/bigquery-import.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://raw.githubusercontent.com/primer/octicons/refs/heads/main/icons/mark-github-24.svg\" alt=\"GitHub logo\"><br> View on GitHub\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/embeddings/bigquery-import.ipynb\">\n",
        "      <img src=\"https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "</table>\n",
        "\n",
        "<div style=\"clear: both;\"></div>\n",
        "\n",
        "<b>Share to:</b>\n",
        "\n",
        "<a href=\"https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/bigquery-import.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg\" alt=\"LinkedIn logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/bigquery-import.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg\" alt=\"Bluesky logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/bigquery-import.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg\" alt=\"X logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/bigquery-import.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png\" alt=\"Reddit logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/bigquery-import.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg\" alt=\"Facebook logo\">\n",
        "</a>            "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4uoRmYQsKBgl"
      },
      "source": [
        "| Authors |\n",
        "| --- |\n",
        "| [Eric Gribkoff](https://github.com/ericgribkoff) |"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RQT500QqVPIb"
      },
      "source": [
        "### Objectives\n",
        "\n",
        "In this notebook, you will learn how to import vector embedding data from a BigQuery data source into a Vector Search index."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DXJpXzKrh2rJ"
      },
      "source": [
        "## Getting Started"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FtsU9Bw9h2rL"
      },
      "source": [
        "### Authenticate your notebook environment (Colab only)\n",
        "\n",
        "If you are running this notebook on Google Colab, run the following cell to authenticate your environment. This step is not required if you are using [Vertex AI Workbench](https://cloud.google.com/vertex-ai-workbench)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "GpYEyLsOh2rL"
      },
      "outputs": [],
      "source": [
        "import sys\n",
        "\n",
        "# Additional authentication is required for Google Colab\n",
        "if \"google.colab\" in sys.modules:\n",
        "    # Authenticate user to Google Cloud\n",
        "    from google.colab import auth\n",
        "\n",
        "    auth.authenticate_user()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "O1vKZZoEh2rL"
      },
      "source": [
        "### Set Google Cloud project information and initialize Vertex AI SDK\n",
        "\n",
        "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com) and [enable the BigQuery API](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com).\n",
        "\n",
        "Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "gJqZ76rJh2rM"
      },
      "outputs": [],
      "source": [
        "# Use the environment variable if the user doesn't provide Project ID.\n",
        "import os\n",
        "\n",
        "PROJECT_ID = \"your-project-id\"  # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n",
        "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n",
        "    PROJECT_ID = str(os.environ.get(\"GOOGLE_CLOUD_PROJECT\"))\n",
        "LOCATION = os.environ.get(\"GOOGLE_CLOUD_REGION\", \"us-central1\")\n",
        "\n",
        "# Initialize the aiplatform package\n",
        "from google.cloud import aiplatform\n",
        "\n",
        "aiplatform.init(project=PROJECT_ID, location=LOCATION)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2176030e88a0"
      },
      "source": [
        "## Generate sample BigQuery data\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "beaMkDH9zlr2"
      },
      "outputs": [],
      "source": [
        "%%bigquery --project $PROJECT_ID\n",
        "CREATE SCHEMA import_example_dataset; -- OPTIONS (location=$LOCATION);\n",
        "CREATE TABLE import_example_dataset.test_table (\n",
        "  id INTEGER,\n",
        "  embedding ARRAY <FLOAT64>,\n",
        "  allow_column STRING,\n",
        "  deny_column STRING,\n",
        "  int_column INTEGER,\n",
        "  float_column FLOAT64,\n",
        "  metadata_column STRING\n",
        ");\n",
        "INSERT INTO import_example_dataset.test_table(id, embedding, allow_column, deny_column, int_column, float_column, metadata_column) VALUES\n",
        "(1, [0.1, 0.1, 0.1], \"allow1\", \"deny1\", 1, 0.1, \"metadata1\"),\n",
        "(2, [0.2, 0.2, 0.2], \"allow2\", \"deny2\", 2, 0.2, \"metadata2\");\n",
        "SELECT * FROM import_example_dataset.test_table;"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7a2d7f3e3a33"
      },
      "source": [
        "## Create a Vector Search Index\n",
        "\n",
        "This may take a few moments to finish the index creation."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "10kzx4YoVjMU"
      },
      "outputs": [],
      "source": [
        "my_index = aiplatform.MatchingEngineIndex.create_tree_ah_index(\n",
        "    display_name='import_test_index_name',\n",
        "    dimensions=3,\n",
        "    approximate_neighbors_count=10,\n",
        "    index_update_method=\"BATCH_UPDATE\",\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dvqIfG421PW7"
      },
      "source": [
        "## Import from BigQuery into Vector Search\n",
        "\n",
        "This returns a Long-Running Operation (LRO), which allows you to track the progress of the import operation."
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import requests\n",
        "\n",
        "# Get token to use for REST request\n",
        "gcloud_token = !gcloud auth print-access-token\n",
        "\n",
        "url = f\"https://{LOCATION}-aiplatform.googleapis.com/v1beta1/{my_index.resource_name}:import\"\n",
        "headers = {\n",
        "    \"Content-Type\": \"application/json\",\n",
        "    \"Authorization\": f\"Bearer {gcloud_token[0]}\"\n",
        "}\n",
        "request = {\n",
        "  \"is_complete_overwrite\": True,\n",
        "  \"config\": {\n",
        "    \"big_query_source_config\": {\n",
        "      \"table_path\": f\"bq://{PROJECT_ID}.import_example_dataset.test_table\",\n",
        "      \"datapoint_field_mapping\": {\n",
        "        \"id_column\": \"id\",\n",
        "        \"embedding_column\": \"embedding\",\n",
        "        \"restricts\": [\n",
        "          {\"namespace\": \"restrict\", \"allow_column\": [\"allow_column\"], \"deny_column\": [\"deny_column\"]},\n",
        "        ],\n",
        "        \"numeric_restricts\": [\n",
        "          {\"namespace\": \"int_restrict\", \"value_column\": \"int_column\", \"value_type\": \"INT\"},\n",
        "          {\"namespace\": \"float_restrict\", \"value_column\": \"float_column\", \"value_type\": \"FLOAT\"}\n",
        "        ]\n",
        "        # The import may include metadata if your project has been allow-listed\n",
        "        # for the VS metadata preview.\n",
        "        #\"embedding_metadata\": \"metadata_column\"\n",
        "      }\n",
        "    }\n",
        "  }\n",
        "}\n",
        "\n",
        "try:\n",
        "    # Make the POST request\n",
        "    response = requests.post(url, headers=headers, json=request)\n",
        "\n",
        "    # Check the response status code\n",
        "    if response.status_code == 200:\n",
        "        print(\"Import request successful!\")\n",
        "        print(response.json())\n",
        "    else:\n",
        "        print(f\"Import request failed with status code: {response.status_code}\")\n",
        "        print(response.text)\n",
        "\n",
        "except requests.exceptions.RequestException as e:\n",
        "    print(f\"An error occurred: {e}\")\n"
      ],
      "metadata": {
        "id": "S0ikA0XYq15Y"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "The code below will query the API for the status of the import LRO.\n",
        "\n"
      ],
      "metadata": {
        "id": "U_oOwFdAv2Wv"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "operation = response.json()['name']\n",
        "response = requests.get(f\"https://{LOCATION}-aiplatform.googleapis.com/v1beta1/{operation}\", headers=headers)\n",
        "\n",
        "if 'done' in response.json():\n",
        "  if 'error' in response.json():\n",
        "    print(\"Import failed\")\n",
        "    print(response.json()['error'])\n",
        "  else:\n",
        "    print(\"Import succeeded!\")\n",
        "    print(response.json())\n",
        "else:\n",
        "    print(\"Import still in progress\")\n",
        "print(response.json())\n"
      ],
      "metadata": {
        "id": "7SZZY8a4vS2w"
      },
      "execution_count": null,
      "outputs": []
    }
  ],
  "metadata": {
    "colab": {
      "toc_visible": true,
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
